Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustering federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.
翻译:联邦学习已成为在不同领域培训机器学习模型的重要范例。 对于图表分类等图表级任务,图表也可以被视为特殊类型的数据样本,可以收集和储存在不同的本地系统中。与其他领域类似,多个地方系统,每个系统都持有少量的图表,它们都可以从合作培训强大的图形采矿模型中受益,如流行的图形神经网络(GNN)等。为了为这种努力提供更多动力,我们从不同领域分析真实世界的图表,以证实它们确实共享某些与随机图表相比具有统计重要性的图表属性。然而,我们也发现不同的图表组,即使是同一领域或同一数据集的图表组,在图形结构和节点特点方面都是非IID。为了处理这一点,我们提议了一个图表组合化学习框架,以动态方式发现基于GNNN的梯度的本地系统组合,从理论上说,这种组合可以减少地方系统拥有的图表的结构和特征差异性。此外,我们还发现,不同的图表组,即使是同一领域或同一数据集的图表组,都是非IID的。为了处理这一点,我们提议了一个图表组合组合式联合学习框架,以动态查找基于GNNGGGGG的深度和深度分析,我们基于动态GGFGGMGM的深度结构结构的梯度,从而阻碍了以高频度的深度分析。