Graph Neural Network (GNN) research is rapidly growing thanks to the capacity of GNNs in learning distributed representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to privacy concerns, regulation restrictions, and commercial competitions. Federated learning (FL), a trending distributed learning paradigm, provides possibilities to solve this challenge while preserving data privacy. Despite recent advances in vision and language domains, there is no suitable platform for the FL of GNNs. To this end, we introduce FedGraphNN, an open FL benchmark system that can facilitate research on federated GNNs. FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support. Particularly for the datasets, we collect, preprocess, and partition 36 datasets from 7 domains, including both publicly available ones and specifically obtained ones such as hERG and Tencent. Our empirical analysis showcases the utility of our benchmark system, while exposing significant challenges in graph FL: federated GNNs perform worse in most datasets with a non-IID split than centralized GNNs; the GNN model that attains the best result in the centralized setting may not maintain its advantage in the FL setting. These results imply that more research efforts are needed to unravel the mystery behind federated GNNs. Moreover, our system performance analysis demonstrates that the FedGraphNN system is computationally efficient and secure to large-scale graphs datasets. We maintain the source code at https://github.com/FedML-AI/FedGraphNN.
翻译:由于GNN能够从图表结构的数据中学习分布式表述,GNN的研究正在迅速增加。然而,由于隐私问题、监管限制和商业竞争,为GNN培训集中大量真实世界图形数据令人望而生畏。Federal Learning(FL)是一个分布式学习模式,它提供了在保护数据隐私的同时解决这一挑战的可能性。尽管在愿景和语言领域最近有所进步,但GNN的FL没有合适的平台。为此,我们引入了FedGraphNN,这是一个开放的FL基准系统,可以促进对GNNNN的研究。F GGGphNNNN的快速化分析是建立在图表FLL的统一配制上,包含来自不同领域的广泛数据集、流行的GNNNM模型和F的算法,有安全和高效的系统支持。特别是数据集,我们收集、预处理和分区的36个数据集来自7个领域,包括公开提供的数据集和具体获得的HERGNG和Tentent。我们的经验分析显示我们的基准系统的实用性,显示我们的基准系统的作用,而我们的GNNNNNF的G-G-LNLNF的深度数据则显示在F的中央数据中,而没有硬化的中央数据中,其核心结果中则显示最低的数值值的数值值数据系统,其最低的结果显示其核心结果。