PremoGNN: 图形神经网络的无服务器多任务联合学习 (SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks)

Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems thanks to their ability to learn state-of-the-art level representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns, regulation restrictions, and commercial competition. Federated Learning is the de-facto standard for collaborative training of machine learning models over many distributed edge devices without the need for centralization. Nevertheless, training graph neural networks in a federated setting is vaguely defined and brings statistical and systems challenges. This work proposes SpreadGNN, a novel multi-task federated training framework capable of operating in the presence of partial labels and absence of a central server for the first time in the literature. SpreadGNN extends federated multi-task learning to realistic serverless settings for GNNs, and utilizes a novel optimization algorithm with a convergence guarantee, Decentralized Periodic Averaging SGD (DPA-SGD), to solve decentralized multi-task learning problems. We empirically demonstrate the efficacy of our framework on a variety of non-I.I.D. distributed graph-level molecular property prediction datasets with partial labels. Our results show that SpreadGNN outperforms GNN models trained over a central server-dependent federated learning system, even in constrained topologies. The source code is publicly available at https://github.com/FedML-AI/SpreadGNN

翻译：神经网络(GNNs)是图表机器学习问题的首选方法,这是因为他们有能力从图形结构数据中学习最新水平的图像。然而,将大量真实世界图形数据集中起来用于GNN培训,由于用户-用户-用户-用户隐私关切、监管限制和商业竞争,对GNN培训来说,是令人望而却步的。联邦学习是在许多分布式边缘设备上合作培训机器学习模型的不法标准,不需要集中使用。然而,在联合环境中培训图形神经网络的界定模糊不清,带来了统计和系统的挑战。这项工作提议了DEBGNNNNNN,这是一个新型的多任务联合培训框架,能够在部分标签和文献中首次缺少中央服务器的情况下运行。TEBGNNNNN将联动多任务学习推广到现实的无服务器环境,并且利用具有趋同保证的新优化算法,即分散的Sverizal AGNG(DPA-SGD),解决分散的多任务学习问题。我们通过实验性实践方式展示了我们框架在已培训过的GNMNMMS 核心数据级的模型上,展示了我们已升级的系统。