Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological, or transportation networks, among many other applications. In this paper, we propose a new framework for rigorously studying continuum limits of learning algorithms on directed graphs. We use the new framework to study the PageRank algorithm, and show how it can be interpreted as a numerical scheme on a directed graph involving a type of normalized graph Laplacian. We show that the corresponding continuum limit problem, which is taken as the number of webpages grows to infinity, is a second-order, possibly degenerate, elliptic equation that contains reaction, diffusion, and advection terms. We prove that the numerical scheme is consistent and stable and compute explicit rates of convergence of the discrete solution to the solution of the continuum limit PDE. We give applications to proving stability and asymptotic regularity of the PageRank vector. Finally, we illustrate our results with numerical experiments and explore an application to data depth.
翻译:半监督和未经监督的机器学习方法往往依靠图表来模拟数据,促使研究如何在学习问题中利用图表操作者的理论特性。虽然大多数现有文献侧重于非定向图表,但定向图表在实践中非常重要,为物理、生物或运输网络提供了模型等许多其他应用。在本文件中,我们提议了一个新的框架,以严格研究定向图形上学习算法的连续性限制。我们使用新的框架来研究PageRank算法,并展示如何将其解读为在涉及一种普通图形 Laplacian类型的定向图形上的数字方案。我们展示了相应的连续性限制问题,即当网页数量增长到无限时,它是一个第二顺序,可能是退化的,椭圆方程式,包含反应、扩散和适应术语。我们证明数字方案是一致和稳定的,并且对离散溶方法与连续限制PageRank矢量的解决方案的一致率进行了计算。我们用应用来证明稳定的,并且对PageRank矢量的深度数据进行了精确的实验。最后,我们用数字实验和数字矢量来说明我们的结果。