Distributed methods for training models on large graphs have recently grown in popularity, due to the size of graphs as well as the private nature of graph data like social networks. However, the graph structure means that a single graph cannot be disjointly partitioned into different learning clients, leading to either significant communication overhead between clients or a loss of information available to the training method. We then introduce Federated Graph Convolutional Network (FedGCN), which uses federated learning to train GCN models for semi-supervised node classification on large graphs with optimized convergence rate and communication cost. Compared to prior methods that require communication among clients at each training round, FedGCN preserves the privacy of client data and only needs communication at the initial step, which greatly reduces communication cost and speeds up the convergence. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions, and introduce a general framework that can be generally used for the analysis of all edge-completion-based GCN training algorithms. Experimental results demonstrate the effectiveness of our algorithm and validate our theoretical analysis.
翻译:由于图表的大小以及像社交网络一样的图形数据具有私人性质,大图培训模型的分布方法最近越来越受欢迎。但是,图表结构意味着单图不能被分解成不同的学习客户,导致客户之间的通信间接费用巨大,或者导致培训方法中的信息丢失。我们然后引入Freedo Convolutional Network(FedGCN),我们采用联合学习来培训GCN模型,用于以优化的趋同率和通信成本对大图进行半监督的节点分类。与以往要求客户在每轮培训中进行沟通的方法相比,FDGCN保存客户数据的隐私,只需要在最初阶段进行沟通,从而大大降低通信成本,加快整合速度。我们从理论上分析了FDGCN在不同数据分布下汇合率和通信成本之间的权衡,并引入了一般可用于分析所有精准完成GCN培训算法的一般框架。实验结果表明我们的算法的有效性并验证了我们的理论分析。