Community detection becomes an important problem with the booming of social networks. As an excellent clustering algorithm, Mean-Shift can not be applied directly to community detection, since Mean-Shift can only handle data with coordinates, while the data in the community detection problem is mostly represented by a graph that can be treated as data with a distance matrix (or similarity matrix). Fortunately, a new clustering algorithm called Medoid-Shift is proposed. The Medoid-Shift algorithm preserves the benefits of Mean-Shift and can be applied to problems based on distance matrix, such as community detection. One drawback of the Medoid-Shift algorithm is that there may be no data points within the neighborhood region defined by a distance parameter. To deal with the community detection problem better, a new algorithm called Revised Medoid-Shift (RMS) in this work is thus proposed. During the process of finding the next medoid, the RMS algorithm is based on a neighborhood defined by KNN, while the original Medoid-Shift is based on a neighborhood defined by a distance parameter. Since the neighborhood defined by KNN is more stable than the one defined by the distance parameter in terms of the number of data points within the neighborhood, the RMS algorithm may converge more smoothly. In the RMS method, each of the data points is shifted towards a medoid within the neighborhood defined by KNN. After the iterative process of shifting, each of the data point converges into a cluster center, and the data points converging into the same center are grouped into the same cluster.
翻译:社区检测是随着社交网络的兴起成为一个重要问题。优秀的聚类算法Mean-Shift不能直接应用于社区检测,因为Mean-Shift只能处理具有坐标的数据,而社区检测问题中的数据大多数是由可以视为具有距离矩阵(或相似性矩阵)的数据表示的图表。幸运的是,提出了一种新的聚类算法Medoid-Shift。Medoid-Shift算法保留了Mean-Shift的优点,可用于基于距离矩阵的问题,例如社区检测。Medoid-Shift算法的一个缺点是,由距离参数定义的邻域区域可能没有数据点。为了更好地处理社区检测问题,在本文中提出了一种名为修订Medoid-Shift(RMS)的新算法。在寻找下一个中心点的过程中,RMS算法基于由KNN定义的邻域,而原始的Medoid-Shift算法基于由距离参数定义的邻域。由于由KNN定义的邻域在邻域中的数据点数量方面比距离参数定义的邻域更稳定,因此RMS算法可能会更平滑地收敛。在RMS方法中,每个数据点都朝向在由KNN定义的邻域中的介质移动。在移位的迭代过程之后,每个数据点都会收敛到一个群集中心,收敛到相同中心的数据点将被分组到同一群集中。