Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method.
翻译:通过对比机制将结点分成不相干的群体,而深相对比的图表群集旨在将结点分成不相干的群体,这是一个具有挑战性的研究点。在最近开展的工作中,基于采矿的硬抽样算法已经引起人们对其前景良好的业绩的极大关注。然而,我们发现,现有的硬抽样采样方法有两个问题如下:(1) 在硬度测量中,重要的结构信息被忽略用于相似性计算,降低了所选硬性负抽样样本的代表性。(2) 以前的工作仅仅侧重于硬性负性样本对对对比,而忽视硬性正性样本对对比。然而,在同一组内但相似程度较低的样本中,也应仔细学习。为了解决问题,我们建议采用一种新的对比性深重的深重图形组组法,以假称“硬性抽样网”(HSAN),采用全面的相似性测量标准和总体动态抽样加权战略。具体地说,通过考虑属性嵌入和结构嵌入,更好地披露抽样关系,协助测量硬性数据。此外,在认真收集的高性类组集信息的指导下,我们提议的加权、低相近似的重度功能将首先显示我们硬性抽样的硬性抽样分析,因此也无法轻易地改进。