Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC) methods have successfully extended deep clustering to graph-structured data by learning node representations and cluster assignments in a joint optimization framework. Despite some differences in modeling choices (e.g., encoder architectures), existing DGC methods are mainly based on autoencoders and use the same clustering objective with relatively minor adaptations. Also, while many real-world graphs are dynamic, previous DGC methods considered only static graphs. In this work, we develop CGC, a novel end-to-end framework for graph clustering, which fundamentally differs from existing methods. CGC learns node embeddings and cluster assignments in a contrastive graph learning framework, where positive and negative samples are carefully selected in a multi-level scheme such that they reflect hierarchical community structures and network homophily. Also, we extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion, with the ability to detect change points. Extensive evaluation on real-world graphs demonstrates that the proposed CGC consistently outperforms existing methods.
翻译:在网页数据中,给定实体及其相互作用,这些实体可能在不同的时间发生,如何发现实体社区并跟踪它们的演变?在本文中,我们从图聚类角度来解决这个重要任务。最近,深度聚类方法在各个领域取得了最先进的聚类性能。特别地,通过在联合优化框架中学习节点表示和聚类分配,深度图聚类(DGC)方法成功地将深度聚类扩展到了图结构数据。尽管在建模选择(例如编码器架构)方面存在一些差异,但现有的DGC方法主要基于自动编码器,使用相对较小的调整使用相同的聚类目标。此外,尽管许多真实世界的图形是动态的,以前的DGC方法只考虑了静态图。在这项工作中,我们开发了CGC,这是一种新颖的端到端图聚类框架,它与现有方法完全不同。CGC在对比图学习框架中学习节点嵌入和聚类分配,其中正负样本在多级方案中被精心选择,使它们反映层次社区结构和网络同质性。此外,我们扩展了CGC,针对时间变化的数据,其中时态图聚类以增量学习的方式进行,具有检测变化点的能力。对真实世界的图进行广泛的评估,证明了所提出的CGC始终优于现有的方法。