Methods for training models on graphs distributed across multiple clients have recently grown in popularity, due to the size of these graphs as well as regulations on keeping data where it is generated, like GDPR in the EU. However, a single connected graph cannot be disjointly partitioned onto multiple distributed clients due to the cross-client edges connecting graph nodes. Thus, distributed methods for training a model on a single graph incur either significant communication overhead between clients or a loss of available information to the training. We introduce the Federated Graph Convolutional Network (FedGCN) algorithm, which uses federated learning to train GCN models for semi-supervised node classification on large graphs with fast convergence and little communication. Compared to prior methods that require communication among clients at each training round, FedGCN clients only communicate with the central server in one pre-training step, greatly reducing communication costs. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions and introduce a general framework that can be used for analysis of all edge-completion-based GCN training algorithms. Experimental results show that our FedGCN algorithm achieves 51.7% faster convergence on average and at least 100X less communication cost compared to prior work.
翻译:最近,由于这些图表的规模和关于将数据保存在生成地点的监管条例,在多个客户之间分布的图表培训模型方法最近越来越受欢迎,例如欧盟的GDPR;然而,由于交叉客户的边缘连接图形节点,单一连接的图表无法分解到多个分布的客户;因此,单一图形培训模型的分布式方法要么在客户之间产生重大的通信间接费用,要么在培训中丢失现有信息;我们引入了联动式联动网络算法,该算法利用联动学习来培训GCN模型,在快速趋同和通信很少的大型图表上进行半监督的节点分类;与以往要求客户在每轮培训中进行沟通的方法相比,FDGCN客户仅在培训前一步与中央服务器进行沟通,大大减少通信费用;我们从理论上分析了FDGCN在不同数据分发中汇集率和通信成本之间的权衡,并引入一个可用于分析所有边缘完成的GCN最小值培训算法的通用框架;实验结果显示,与我们的平均通信成本相比,在100比之前,FDCN算法工作实现了51的更快的趋同率。