Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.
翻译:在网络数据中给定实体及其相互作用,这些作用可能发生在不同的时间,如何发现实体社区并跟踪其演变呢?在本文中,我们从图聚类的角度来处理这个重要的任务。最近,通过使用深度聚类方法,在各种领域实现了最先进的聚类性能。特别是,通过在联合优化框架中学习节点表示和聚类分配,深度图聚类 (DGC) 方法已经成功地将深度聚类扩展到图结构化数据。尽管在建模选择上存在一些差异(例如编码器架构),现有的 DGC 方法主要基于自编码器,并使用相对较小的适应性来进行相同的聚类目标。此外,尽管许多现实世界的图是动态的,但以前的 DGC 方法仅考虑静态图。在本文中,我们开发了 CGC,这是一个新颖的端到端图聚类框架,与现有方法根本不同。CGC 在对比图学习框架中学习节点嵌入和聚类分配,其中正样本和负样本在多层方案中被精心选择,以反映分层社区结构和网络同质性。此外,我们扩展了用于时间演变数据的 CGC,其中以增量学习方式进行时间图聚类,具有检测变更点的能力。对实际图进行的广泛评估表明,所提出的 CGC 一致优于现有方法。