Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holds a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.
翻译:在数据挖掘和机器学习中广泛使用图表,这是因为它们独特地代表了现实世界的物体及其相互作用。由于图表现在正在变得越来越大,因此看到它们的子集单独收集并存储在多个本地系统中是常见的。因此,自然会考虑子集联学习设置,每个本地系统都拥有一个可能与整个图表分布有偏差的小型子集。因此,子集成学习的目的是在不直接分享其图形数据的情况下,合作训练一个强大和可通用的图形开采模型。在这项工作中,随着子集成的子集成学习的新而现实的设置,我们提出了两种主要技术:(1) FedSage,它以FedAvg为基础,培训一个图形学模型,以整合节点特征、链接结构以及多个本地子集图上的任务标签;(2) FedSage+,它与FedSage一起培训一个缺失的邻居发电机,以便处理本地子集缺失的链接。在四个真实世界的图表数据集中以合成的子集联成的子集化化学习环境,我们提出了两种主要技术的效能和效率。