基于打击概率的定向图表和Markov链的测量指标 (A metric on directed graphs and Markov chains based on hitting probabilities)

from arxiv, 26 pages, 9 figures, for associated code, visit https://github.com/zboyd2/hitting_probabilities_metric, accepted at SIAM J. Math. Data Sci

The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chain derived from a directed graph. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics. We use possible degeneracies in the metric to develop an interesting structural theory of directed graphs and explore a related quotienting procedure. Our metric can be computed in $O(n^3)$ time, where $n$ is the number of states, and in examples we scale up to $n=10,000$ nodes and $\approx 38M$ edges on a desktop computer. In several examples, we explore the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs, visualization, structure recovering, dynamics exploration, and multiscale cluster detection.

翻译：无方向图表的最短路径、通勤时间和传播距离被广泛用于维维度减少、链接预测和旅行规划等应用。人们越来越有兴趣使用来自Markov链和定向图表的数据结构不对称,但只有很少的量度具体适应这项任务。我们引入了一个衡量标准,说明任何热度、有限状态、时间同步的Markov链条,特别是来自定向图表的任何Markov链条。我们的构建基于概率,衡量空间的接近性与随机行走器从一个节点转移到另一个节点有关。值得注意的是,我们的度量对最短和平均行走距离不敏感,从而提供了与现有量值相比的新信息。我们使用该度中可能存在的偏差度来开发一个引人注意的定向图表结构理论,并探索相关的引价程序。我们的度可以用美元(n3美元)时间计算,其中的美元是州数,在度空间空间空间中,与随机行走节距从一个节点转移到另一个节点有关。我们将10 000美元的比例提高到了平均行走距离,从而展示了计算机回收率直径结构的模型。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日