Many network analysis and graph learning techniques are based on models of random walks which require to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, it is common to estimate the entries of such transition matrices based on the relative weights of edges. However, we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not, e.g. transportation networks, where we have access to passenger trajectories as well as the physical topology of connections, or a set of social interactions with the underlying social structure. Combining these two different sources of information to infer transition matrices is an open challenge, with implications on the downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can improve the inference of transition matrices, especially for small datasets. We derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to infer transition matrices data-efficiently. We compare it against commonly used frequentist and Bayesian approaches both in synthetic and real-world datasets, and we find that it recovers the transition probabilities with higher accuracy and that it is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for analyses of various networked systems.
翻译:许多网络分析和图表学习技术基于随机行走模型,这些模型需要推导过渡矩阵,使基本随机过程在观察到的图表中正式化。对于加权图表,根据相对的边缘权重来估计这种过渡矩阵的条目是常见的。然而,我们常常面临不完全的数据,将基于加权图的过渡矩阵构建变成推论问题。此外,我们常常能够获得更多的信息,这些信息可以捕捉系统的地形限制,即加权图表中的边缘(理论上的)是可能的,而不是,例如,运输网络的边缘是可能的,并且不是,例如,我们可访问乘客轨迹以及连接的物理表层学或一系列与基本社会结构的社会互动的条目。将这两个不同的信息源合并起来,将过渡矩阵推导出一个推导出推导出推导论问题。此外,我们发现关于这种表面约束的知识可以改善过渡矩阵的推导力,特别是小数据集系。我们从可分析的准确性选择的准确性网络的准确性以及连接的物理结构学,我们用这种分析方法来反复地分析数据,我们之前的深度的深度数据分析,我们用这种分析方法来分析。