The emergence of massive graph data sets requires fast mining algorithms. Centrality measures to identify important vertices belong to the most popular analysis methods in graph mining. A measure that is gaining attention is forest closeness centrality; it is closely related to electrical measures using current flow but can also handle disconnected graphs. Recently, [Jin et al., ICDM'19] proposed an algorithm to approximate this measure probabilistically. Their algorithm processes small inputs quickly, but does not scale well beyond hundreds of thousands of vertices. In this paper, we first propose a different approximation algorithm; it is up to two orders of magnitude faster and more accurate in practice. Our method exploits the strong connection between uniform spanning trees and forest distances by adapting and extending recent approximation algorithms for related single-vertex problems. This results in a nearly-linear time algorithm with an absolute probabilistic error guarantee. In addition, we are the first to consider the problem of finding an optimal group of vertices w.r.t. forest closeness. We prove that this latter problem is NP-hard; to approximate it, we adapt a greedy algorithm by [Li et al., WWW'19], which is based on (partial) matrix inversion. Moreover, our experiments show that on disconnected graphs, group forest closeness outperforms existing centrality measures in the context of semi-supervised vertex classification.
翻译:大型图形数据集的出现需要快速的采矿算法。 确定重要脊椎的中央措施属于图形采矿中最受欢迎的分析方法。 正在得到注意的一项措施是森林近距离中心; 它与使用当前流流的电量密切相关, 但也能够处理断开的图形。 最近, [Jin 等人, ICD' 19] 提出了一个算法, 以概率性保证来估计这一计量。 他们的算法处理小投入, 但没有大大超过数十万个脊椎。 在本文中, 我们首先提出一个不同的近似算法; 在实际中, 它最多达到两个数量级的更快和更加准确。 我们的方法利用了统一横跨树木和森林距离之间的密切联系, 其方法是调整和扩展相关的单面图问题的最新近似算法。 这导致一种近线性的时间算法, 其绝对概率错误保证。 此外, 我们首先考虑的问题是如何找到一个最佳的脊椎组合 w.r. t. 森林近距离。 我们证明后一个问题是硬度的; 近乎两个数量级级的层次; 我们利用这个方法, 将它加以近似, 我们调整一个直观, 直观的直观的矩阵的矩阵, 以直观的直观的直观的矩阵矩阵, 显示我们的直观矩阵的基底的矩阵, 。