Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.
翻译:联邦学习已成为在不同领域培训机器学习模型的重要范例。 对于图表分类等图表级任务,图表也可以被视为特殊类型的数据样本,可以收集和储存在不同的本地系统中。与其他领域类似,多个地方系统,每个系统都持有少量的图表,它们都可以从合作培训强大的图形采矿模型中受益,如流行的图形神经网络(GNN)等。为了为这种努力提供更多动力,我们从不同领域分析真实世界的图表,以证实它们确实共享某些与随机图表相比具有统计重要性的图表属性。然而,我们也发现不同的图表组,即使是同一领域或同一数据集的图表组,在图形结构和节点特点方面都是非IID的。为了处理这一点,我们提议了一个图表化的组合式联合学习(GCFL)框架,以动态方式找到基于GNNN的梯度的本地系统集群,理论上证明这些组合可以减少与本地系统拥有的图表的结构和异质。此外,我们还发现不同的图表组图组,即使来自同一领域或同一数据集,也是非IID的。为了处理这一点,我们提出了一个图表组合组合式的组合式联合学习框架。我们基于GNNNG+GMGGGGG的深度的深度设计的梯度和高层次结构结构结构结构分析,从而阻碍了我们基于的动态GNFGFGFGFGFGFGFGM的升级的升级的升级的升级的升级的周期性分析。