The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chain derived from a directed graph. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics. We use possible degeneracies in the metric to develop an interesting structural theory of directed graphs and explore a related quotienting procedure. Our metric can be computed in $O(n^3)$ time, where $n$ is the number of states, and in examples we scale up to $n=10,000$ nodes and $\approx 38M$ edges on a desktop computer. In several examples, we explore the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs, visualization, structure recovering, dynamics exploration, and multiscale cluster detection.
翻译:无方向图表的最短路径、通勤时间和传播距离被广泛用于维维度减少、链接预测和旅行规划等应用。人们越来越有兴趣使用来自Markov链和定向图表的数据结构不对称,但只有很少的量度具体适应这项任务。我们引入了一个衡量标准,说明任何热度、有限状态、时间同步的Markov链条,特别是来自定向图表的任何Markov链条。我们的构建基于概率,衡量空间的接近性与随机行走器从一个节点转移到另一个节点有关。值得注意的是,我们的度量对最短和平均行走距离不敏感,从而提供了与现有量值相比的新信息。我们使用该度中可能存在的偏差度来开发一个引人注意的定向图表结构理论,并探索相关的引价程序。我们的度可以用美元(n3美元)时间计算,其中的美元是州数,在度空间空间空间中,与随机行走节距从一个节点转移到另一个节点有关。我们将10 000美元的比例提高到了平均行走距离,从而展示了计算机回收率直径结构的模型。