Recently, there has been considerable research interest in graph clustering aimed at data partition using the graph information. However, one limitation of the most of graph-based methods is that they assume the graph structure to operate is fixed and reliable. And there are inevitably some edges in the graph that are not conducive to graph clustering, which we call spurious edges. This paper is the first attempt to employ graph pooling technique for node clustering and we propose a novel dual graph embedding network (DGEN), which is designed as a two-step graph encoder connected by a graph pooling layer to learn the graph embedding. In our model, it is assumed that if a node and its nearest neighboring node are close to the same clustering center, this node is an informative node and this edge can be considered as a cluster-friendly edge. Based on this assumption, the neighbor cluster pooling (NCPool) is devised to select the most informative subset of nodes and the corresponding edges based on the distance of nodes and their nearest neighbors to the cluster centers. This can effectively alleviate the impact of the spurious edges on the clustering. Finally, to obtain the clustering assignment of all nodes, a classifier is trained using the clustering results of the selected nodes. Experiments on five benchmark graph datasets demonstrate the superiority of the proposed method over state-of-the-art algorithms.
 翻译:最近,人们对图形群集的兴趣相当大。然而,大多数基于图形的方法的一个局限性是,他们假设要运行的图形结构是固定和可靠的。而且,图表中不可避免地有一些边缘不利于图形群集,我们称之为虚假边缘。本文是首次尝试为节点群集使用图集集合技术,我们提出一个新的双图嵌入网络(DGEN),这是设计成一个两步图解解码编码器,由图集层连接,以学习图形嵌入。在我们模型中,假设如果一个节点及其最近的相邻节点接近同一组群集中心,这个节点是一个信息丰富的节点,这种边缘可以被视为有利于集聚集的边缘。基于这一假设,近端群集集(NCPool)旨在选择一个信息最丰富的节点子和基于结点距离及其最近的邻系中心的相应边缘。这可以有效地减轻一个节点及其近端点对图形嵌入的影响。在我们模型中,如果一个节点及其最近的相邻节点靠近的节点接近的节点接近点与同一组集中心点接近,那么这个节点的节点将是一个信息丰富的节点,那么,这个节点就必然的节点是一个信息节点是一个信息节点。这个节点是一个信息节点,这个节点是一个信息节点,这个节点,这个节点是一个信息节点,这个节点是一个信息节点。这个节点是一个信息节点的准的准的准的边缘将被视为。最后的边缘将被认为是的边点,这个节点,这个节点。最后,这个节点是一个信息节点是一个信息节点,这个节点点点点点。最后的边缘可以用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来证明,用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来用来测量到一个用来测量的。最后的。最后的。