Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data recursively from neighboring nodes. To reduce the computation overhead, previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors. Although these methods have successfully accelerated the training, they mainly focus on the single-machine setting. As real-world graphs are large, training GCNs in distributed systems is desirable. However, we found that the existing neighbor sampling methods do not work well in a distributed setting. Specifically, a naive implementation may incur a huge amount of communication of feature vectors among different machines. To address this problem, we propose a communication-efficient neighbor sampling method in this work. Our main idea is to assign higher sampling probabilities to the local nodes so that remote nodes are accessed less frequently. We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training. Our experiments with node classification benchmarks show that our method significantly reduces the communication overhead for distributed GCN training with little accuracy loss.
翻译:为了减少计算间接费用,先前的工程提出了各种邻居抽样方法,根据少量抽样邻居估算汇总结果。虽然这些方法成功地加快了培训速度,但主要侧重于单机设置。由于真实世界图图是巨大的,在分布式系统中培训GCN是可取的。然而,我们发现现有的邻居取样方法在分布式环境中效果不好。具体地说,天真的实施可能在不同机器之间产生大量地物矢量的通信。为解决这一问题,我们提出了一种通信效率高的邻居取样方法。我们的主要想法是给本地节点指定更高的采样概率,以便更经常地访问远程节点。我们提出了一个算法,确定当地的采样概率,并确保我们偏差的邻居取样不会影响培训的趋同性。我们用节点分类基准进行的实验表明,我们的方法大大降低了传播GCN培训的通信量,但准确性损失很小。