重新思考属性图集集的图形自动编码器模型 (Rethinking Graph Autoencoder Models for Attributed Graph Clustering)

Most recent graph clustering methods have resorted to Graph Auto-Encoders (GAEs) to perform joint clustering and embedding learning. However, two critical issues have been overlooked. First, the accumulative error, inflicted by learning with noisy clustering assignments, degrades the effectiveness and robustness of the clustering model. This problem is called Feature Randomness. Second, reconstructing the adjacency matrix sets the model to learn irrelevant similarities for the clustering task. This problem is called Feature Drift. Interestingly, the theoretical relation between the aforementioned problems has not yet been investigated. We study these issues from two aspects: (1) the existence of a trade-off between Feature Randomness and Feature Drift when clustering and reconstruction are performed at the same level, and (2) the problem of Feature Drift is more pronounced for GAE models, compared with vanilla auto-encoder models, due to the graph convolutional operation and the graph decoding design. Motivated by these findings, we reformulate the GAE-based clustering methodology. Our solution is two-fold. First, we propose a sampling operator $\Xi$ that triggers a protection mechanism against the noisy clustering assignments. Second, we propose an operator $\Upsilon$ that triggers a correction mechanism against Feature Drift by gradually transforming the reconstructed graph into a clustering-oriented one. As principal advantages, our solution grants a considerable improvement in clustering effectiveness and robustness and can be easily tailored to existing GAE models.

翻译：最近的图形群集方法采用“图形自动浏览器”来进行联合组合和嵌入学习。然而,我们忽略了两个关键问题。首先,通过密集群集任务进行学习,造成累积错误,削弱了组合模式的有效性和稳健性。这个问题被称为“特异性随机性”。第二,重建对称矩阵模型,以了解与分组任务无关的相似之处。这个问题被称为“特性钻探”。有趣的是,上述问题之间的理论关系尚未调查。我们从两个方面研究这些问题:(1) 当集群和重建在同一级别进行时,特性随机性和特性驱动力之间存在交易;和(2) 特性驱动力问题对于GAE模型来说更为明显,因为与Vanilla 自动编码模型相比,它们为了解与集群任务无关的相似之处。根据这些发现,我们重新配置基于GE的集群方法。我们的解决办法是两重:(1) 当集群和重新配置操作者在集群进行组合时,我们建议一个抽样化操作者 $\X的改进模型,用来触发一个不断升级的GOF Group机制。