SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-$k$ SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than $10^6$ nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-$k$ SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.
翻译:SimRank是根据图表表层学评估节点到节点相似性的一种流行的测量方法。 近年来,单源和顶值美元SimRank查询由于在网络采矿、社交网络分析和垃圾检测方面的应用而日益受到关注。 但是,研究SimRank的一个根本障碍是缺乏地面真相。 唯一准确的算法,即Power方法,是在10 ⁇ 6美元以上节点的图表上进行计算不可行的计算。 因此,现有工作没有评估查询时间与大型真实世界图的准确性之间的实际权衡。 在本文中,我们介绍ExactSim,这是在大图上计算准确的单一源和顶值美元SimRank结果的第一个算法。极有可能,这种算法产生有严格理论保证的地面真相。 我们在真实世界数据集上进行广泛的实验,以证明ExactSim。 结果显示,ExactSim为任何单一源的SimRank查询提供地面真相,在合理的时间范围内精确到十进制点内的任何单一源SimRank查询。